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Abstract 


Communicating  Sequential  Processes  (CSP)  is  a  paradigm  for  communication  and  synchroniza¬ 
tion  among  distributed  processes.  The  alternative  construct  is  a  key  feature  of  CSP  that  allows 
nondeterministic  selection  of  one  among  several  possible  communicants.  Previous  algorithms  for 
this  construct  assume  a  message  passing  architecture  and  are  not  appropriate  for  multiprocessor 
systems  that  feature  shared  memory.  This  paper  describes  a  distributed  algorithm  for  the  al¬ 
ternative  construct  that  exploits  the  capabilities  of  a  parallel  computer  with  shared  memory.  The 
algorithm  assumes  a  generalized  version  of  Hoare’s  original  alternative  construct  that  allows  output 
commands  to  be  included  in  guards.  A  correctness  proof  of  the  proposed  algorithm  is  presented 
to  show  that  the  algorithm  conforms  to  some  safety  and  liveness  criteria.  Extensions  to  allow 
termination  of  processes  and  to  ensure  fairness  in  guard  selection  are  also  given. 

Keywords:  communicating  sequential  processes;  alternative  operation;  shared  memory  multi¬ 
processor;  parallel  processing. 
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1  Introduction 


Communicating  Sequential  Processes  (CSP)  is  a  well  known  paradigm  for  communication  and  syn¬ 
chronization  of  a  parallel  computation  [11,10].  A  CSP  program  consists  of  a  collection  of  processes 
Pi,  P2,  Pn  that  interact  by  exchanging  messages.  These  message  passing  primitives,  called 
input  and  output  commands,  are  synchronous  —  a  process  attempting  to  output  (input)  a  message 
to  (from)  another  process  must  wait  until  the  second  process  has  executed  the  corresponding  input 
(output)  primitive. 

An  important  feature  of  CSP  is  the  alternative  construct  which  is  based  on  Dijkstra's  guarded 
command[6].  This  construct  enables  a  process  to  nondeterministically  select  one  communicant 
among  many.  Each  alternative  operation  specifies  a  list  of  guards.  Each  guard  has  a  set  of  actions 
associated  with  it  that  cannot  be  executed  until  the  value  of  the  corresponding  guard  becomes 
True.  Each  guard  consists  of  a  sequence  of  boolean  expressions  and  an  optional  input  command 
(output  guards  were  not  allowed  in  the  original  specification  of  CSP).  A  guard  is  said  to  be  enabled 
if  each  of  the  boolean  expressions  preceding  the  input  command  evaluates  to  Tri  e.  The  value  of 
a  guard  is  True  if  the  guard  is  enabled  and  its  input  action  has  successfully  completed. 

Implementation  of  the  alternative  construct  on  a  multiple  processor  computer  has  been  the 
subject  of  much  research  [1,2,3,4,5,12,15,22].  It  has  been  argued  that  the  exclusion  of  output 
guards  in  the  original  definition  of  CSP  is  too  restrictive  and  sometimes  degrades  performance 
[3,15].  Algorithms  that  allow  output  guards  in  the  alternative  construct  have  been  proposed  [1.2. 3. 4], 
Others  suggest  a  paradigm  similar  to  that  which  was  originally  proposed  [9,12,22].  All  of  the 
algorithms  reported  thus  far  assume  a  message-based  computer  architecture;  no  shared  memory 
is  assumed.  The  principal  contribution  of  this  paper  is  to  present  an  algorithm  for  implement  in  e 
the  alternative  construct  on  a  shared  memory  multiprocessor  and  to  prove  its  correctness.  To  the 
authors’  knowledge,  no  such  algorithm  has  previously  been  reported. 

CSP  does  not  assume  shared  memory  between  constituent  processes,  so  one  might  ask  why 
implementation  on  a  shared  memory  machine  is  an  issue.  Implementation  of  CSP  on  a  shared 
memory  architecture  is  an  important  question  for  several  reasons: 

•  CSP  has  clean  semantics  that  simplify  proving  the  correctness  of  programs.  It  is  a  worthwhib 
programming  paradigm  in  its  own  right,  independent  of  the  underlying  machine  architecture 

•  The  message  passing  paradigm  is  a  natural  means  of  expressing  programs  in  many  appbrat  ion- 
areas  that  are  well  suitable  for  shared  memory  machines.  For  example,  distributed  discrete 
event  simulation  algorithms  are  usually  described  in  terms  of  message  passing  paradigm-  T.’t, 
16],  and  implementations  on  shared  memory  architectures  have  been  described  [21].  Similarly. 


message  passing  is  used  extensively  in  object-oriented  programming. 

•  Shared  memory  machines  are  widely  available.  Multiprocessors  such  as  the  BBN  Butterfly™ 
[23]  and  Sequent  Balance™  are  available  from  the  commercial  sector,  and  numerous  shared 
memory  research  machines  such  as  IBM’s  RP3  [18]  and  the  University  of  Illinois’s  Cedar  [8] 
have  also  been  developed. 

•  Shared  memory  architectures  provide  fast  interprocessor  communications.  A  complete  inter 
connection  among  processors  is  provided,  avoiding  costly  store-and-forward  communication 
software  in  message-based  architectures  such  as  the  Intel  iPSC™  [20].  At  present,  paral¬ 
lel  processors  using  shared  memory  are  more  appropriate  for  applications  requiring  frequent 
communication  among  the  constituent  processes. 

Although  one  can  clearly  “retrofit’  any  message-based  algorithm  to  a  shared  memory  archi¬ 
tecture  by  building  a  suitable  interface,  this  will  often  lead  to  an  inappropriate  and  awkward 
implementation.  Existing  message-based  algorithms  for  the  alternative  construct  are  not  appropri¬ 
ate  for  a  shared  memory  machine  because  (1)  they  do  not  exploit  the  facilities  afforded  by  shared 
memory,  leading  to  an  inefficient  implementation;  and  (2)  they  require  additional  “system'’  pro¬ 
cesses  to  respond  to  incoming  messages  (e.g.,  requests  for  rendezvous)  resulting  in  unnecessary 
context  switching  overhead.  We  will  describe  an  algorithm  for  the  CSP  alternative  construct  that 
exploits  the  facilities  afforded  by  shared  memory  and  avoids  the  aforementioned  system  processes. 
This  algorithm  implements  the  “generalized"  alternative  construct  that  allows  output  guards. 

The  proposed  algorithm  uses  the  notion  of  total  ordering  among  processes  [3]  to  prevent  dead 
locks,  but  applies  this  principle  dynamically  on  transactions  (defined  later)  rather  than  statically 
as  originally  proposed.  The  shared  memory  architecture  simplifies  the  task  of  maintaining  globally 
unique  IDs.  The  status  of  a  remote  process  can  be  interrogated  directly,  in  contrast  to  the  message 
based  algorithms  where  message  handshake  and  context  switching  overheads  reduce  the  efficiency 
of  the  implementation.  However,  because  processes  in  the  proposed  algorithm  concurrently  access 
shared  data,  great  care  must  be  taken  to  avoid  race  conditions.  Therefore,  we  provide  a  proof  of 
the  correctness  of  the  algorithm  according  to  safety  and  liveness  criteria  [14].  Modifications  are 
also  suggested  to  achieve  fairness  [7]. 

Finally,  the  algorithm  does  not  contain  any  inherent  hot  spots  [19].  The  few  global  variables 
that  are  shared  by  all  processes  are  not  accessed  with  sufficient  frequency  to  constitute  a  hot  spot. 
W’ith  the  exception  of  these  global  variables,  the  algorithm  is  fully  distributed  and  does  not  rely 
on  any  centralized  controller. 

The  remainder  of  this  paper  is  organized  as  follows.  The  semantics  of  the  generalized  alternative 
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construct  are  discussed  first,  followed  by  a  description  of  the  assumed  machine  architecture.  The 
proposed  algorithm  and  a  discussion  of  its  operation  is  then  presented.  Other  important  issues 
related  to  the  algorithm  are  then  discussed,  and  an  extension  to  handle  termination  of  processes  is 
described.  We  conclude  the  paper  with  a  proof  of  the  correctness  of  the  algorithm  followed  by  a 
discussion  of  fairness  issues. 

2  The  Alternative  Construct 

A  guard  of  the  alternative  construct  can  appear  in  one  of  two  possible  forms.  The  first,  called  the 
pure  boolean  form,  contains  no  I/O  command.  For  example,  in 

(ar  =  1  and  y  >  5)  — ►  z  :=  z  *  3; 

the  predicate  to  the  left  of  the  ►’  operator  is  a  pure  boolean  guard.  The  second  form,  called  the 
I/O  guard  form,  contains  an  I/O  command  as  well  as  an  (optional)  boolean  part.  For  example,  in 

P\1x  -»  z  :=  z  +  1; 

the  input  guard  P\1x  requests  input  from  process  P\.  The  received  data  is  assigned  to  the  variable 
x.  Guards  such  as  this  which  do  not  contain  a  boolean  part  are  referred  to  as  pure  I/O  guards.  In 
effect,  the  boolean  part  is  the  constant  TRUE.  An  I/O  guard  is  said  to  be  enabled  if  the  boolean 
part  is  True,  so  a  pure  I/O  guard  is  permanently  enabled. 

Consider  the  following  alternative  construct: 

lGi(.€/>B)  ”*  s>  □  G:(ieio)  -*  £>]• 

Where  PB  stands  for  the  set  of  indices  of  all  of  the  pure  boolean  guards  and  10  the  set  of  indices 
of  all  of  the  I/O  guards.  Whenever  this  alternative  construct  is  executed,  exactly  one  guard  is 
selected  and  the  corresponding  action  (S’,  or  5;)  is  executed.  The  selection  is  made  according  to 
the  availability  of  the  guards.  For  pure  boolean  guards,  the  guard  is  said  to  be  available  if  it 
is  enabled,  i.e.,  if  the  boolean  part  evaluates  to  True.  For  I/O  guards,  the  guard  is  available 
if  it  is  enabled  and  the  process  associated  with  the  guard  is  also  ready  to  communicate  using 
the  complementary  I/O  command.  Because  we  assume  I/O  commands  only  appear  in  guards  of 
alternative  operations,  this  implies  the  remote  process  is  executing  an  alternative  operation  in  which 
the  corresponding  I/O  operation  is  part  of  an  enabled  guard.  If  more  than  one  guard  is  available, 
one  is  chosen  arbitrarily.  The  application  program  cannot  control  this  selection. 

Pure  boolean  guards  can  be  resolved  without  any  interaction  with  other  processes.  Therefore, 
to  simplify  the  discussion  which  follows,  we  will  restrict  attention  to  the  resolution  of  I/O  guards. 


3  The  Machine  Architecture 

The  machine  is  assumed  to  be  a  shared  memory  multiprocessor.  The  algorithm  is  well  suited  for 
machines  such  as  BBN’s  Butterfly  or  Sequent’s  Balance,  among  others.  Several  primitives  are 
used  in  the  algorithm.  None  are  unusual  in  a  multiprocessor  environment,  and  all  can  be  easily 
constructed  using  a  test-and-set  and  standard  scheduling  primitives. 

The  CSP  program  contains  processes  Pi,  P2, ...»  Pn •  Process  P,  is  assigned  the  unique  process 
ID  i  to  distinguish  it  from  others. 

We  will  assume  the  following: 

•  Communications  are  reliable.  An  error  free  communications  mechanism  exists  so  that  two 
distinct  processes  can  communicate  by  exchanging  a  message.  In  particular,  Send(M,  R) 
and  Recv(R):  Message  provide  the  same  semantics  as  CSP’s  output  and  input  commands, 
respectively.  M  is  the  message  which  is  transmitted  and  R  is  the  ID  of  the  remote  process 
with  which  communications  is  to  take  place.  Recv  returns  the  received  message  (of  type 
Message).  In  accordance  with  CSP  semantics,  we  assume  the  process  invoking  the  primitive 
blocks  until  process  Pr  executes  the  complementary  I/O  primitive. 

•  Read  and  write  accesses  to  shared  memory  are  atomic,  as  is  normally  the  case  with  a  shared 
memory  multiprocessor.  AtomicAdd(X):  INTEGER  atomically  increments  the  integer 
variable  X  and  returns  the  original  value  of  A'. 

•  WaitForSignal  and  Signal  primitives  are  available  to  block  and  unblock  the  process,  respec¬ 
tively.  A  signal  contains  a  single,  user  defined  integer  value.  WaitForSignal():  INTEGER 
causes  the  process  invoking  the  primitive  to  block  until  a  signal  becomes  available  to  it  from 
any  other  process  and  returns  the  integer  value  stored  within  the  signal.  Signal(R,  i)  sends  a 
signal  containing  integer  t  to  process  Pr.  The  Signal  primitive  wakes  up  the  signaled  process 
if  it  is  blocked  on  WaitForSignal.  Otherwise,  the  signal  remains  in  effect  until  Pr  executes  a 
WaitForSignal  primitive.  If  a  second  signal  is  sent  to  Pr  before  the  first  is  absorbed  by  a  call 
to  WaitForSignal,  the  first  signal  is  discarded. 

•  Lock  and  Unlock  primitives  provide  exclusive  access  to  shared  data  structures.  Lock(L) 
will  block  until  the  lock  L  becomes  zero,  at  which  time  L  is  set  to  one.  The  “test-and-set'' 
operation  must  be  atomic.  Unlock(L)  sets  the  lock  L  to  zero.  Further,  we  assume  the 
Lock  primitive  is  fair,  i.e.,  if  a  process  is  blocked  while  attempting  to  obtain  a  lock,  it  does 
not  remain  blocked  an  unbounded  amount  of  time  unless  the  lock  is  not  unlocked  for  an 
unbounded  amount  of  time. 
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•  Sleep(T)  causes  the  process  invoking  it  to  block  for  at  least  T  time  units.  A  process  will 
always  eventually  awake  after  calling  Sleep. 

•  The  amount  of  time  between  successive  samples  of  a  shared  memory  location  by  a  busy  wait 
loop  (which  does  nothing  but  sample  and  test  the  value  stored  in  this  location  for  inequality) 
can  be  bounded,  and  is  shorter  than  the  time  required  to  invoke  either  the  Send  or  Recv 
primitives  defined  above. 

This  final  “timing”  assumption  is  perhaps  the  most  distasteful  aspect  of  the  proposed  algorithm. 
It  is  not  necessary  to  ensure  the  safety  of  the  algorithm,  i.e.,  if  it  were  relaxed,  no  “invalid” 
rendezvous  will  result.  The  assumption  is  primarily  a  theoretical  requirement  that  is  necessary  to 
prove  liveness  and  has  only  limited  practical  implications.  If  this  assumption  is  relaxed,  specific 
scenarios  requiring  a  prolonged,  highly  synchronous  behavior  between  independent  processes  must 
develop  to  violate  liveness.  Such  scenarios  are  unlikely  to  occur  in  practice,  as  will  be  discussed  in 
detail  after  the  algorithm  has  been  described,  and  precautions  can  be  take  to  reduce  the  likelihood 
of  such  occurrences  if  the  timing  assumption  cannot  be  guaranteed. 

It  is  assumed  that  all  input  and  output  commands  occur  within  guards  of  the  alternative 
construct.  Simple  CSP  input  and  output  primitives  are  special  cases  of  the  alternative  construct. 
We  also  assume  that  the  variables  used  in  the  alternative  algorithm  are  not  modified  by  processes 
except  as  indicated  in  the  algorithm.  Finally,  it  is  assumed  that  processes  do  not  terminate.  The 
algorithm  can  be  extended  to  handle  termination,  as  will  be  discussed  later. 

4  The  Alternative  Algorithm 

Each  invocation  of  an  alternative  operation  is  referred  to  as  a  transaction.  A  transaction  begins 
when  an  alternative  operation  is  initiated  and  ends  when  a  successful  communication  has  been 
completed.  A  process  will  usually  engage  in  many  transactions  during  its  lifetime.  A  total  ordering 
is  imposed  among  all  transactions  entered  by  all  processes  of  a  given  CSP  program.  A  unique 
sequence  number,  referred  to  here  as  a  transaction  ID,  is  associated  with  each  transaction. 

Two  processes  which  each  initiates  an  alternative  operation  that  results  in  a  communication 
between  them  are  said  to  rendezvous.  More  precise  definitions  of  rendezvous  and  other  terminology 
introduced  in  this  section  will  be  presented  later.  Each  rendezvous  always  involves  exactly  two 
distinct  processes.  In  a  typical  rendezvous,  the  first  process  to  enter  the  alternative  will  block, 
waiting  for  a  signal  from  the  second.  When  the  second  process  enters  the  alternative,  it  will 
commit  to  the  first  in  order  to  obtain  “permission”  to  rendezvous;  the  “committing"  process  will 
then  signal  and  exchange  a  message  with  the  blocked  process,  and  both  will  complete  their  respective 


alternative  operations. 

A  commit  operation  is,  in  effect,  a  request  for  rendezvous.  It  wili  be  shown  that  a  rendezvous 
will  occur  only  after  a  successful  commit  operation  has  taken  place,  and  every  successful  commit 
results  in  a  rendezvous.  A  process  will  not  attempt  to  commit  until  it  has  determined  that  the 
process  with  which  it  is  committing  is  a  suitable  candidate  for  rendezvous,  i.e.,  each  lists  the  other 
in  their  respective  guard  lists,  and  the  two  processes  are  not  both  trying  to  execute  the  same  I/O 
operation  ( Send  or  Rccv).  The  commit  operation  resolves  conflicts  when  two  different  processes 
attempt  to  simultaneously  rendezvous  with  a  third.  The  algorithm  uses  an  “abort  and  retry" 
mechanism  to  avoid  race  conditions  when  two  potential  communicants  simultaneously  enter  the 
alternative  command. 

4.1  Process  States 

Each  process  can  be  in  one  of  the  following  states: 

•  WAITING.  The  process  is  blocked  on  a  W’aitForSignal  operation,  waiting  for  another  pro¬ 
cess  to  rendezvous  with  it. 

•  ALT.  The  process  has  begun  an  alternative  operation,  and  is  scanning  through  its  list  of 
guards  to  find  a  process  with  which  it  can  rendezvous. 

•  SLEEPING.  The  process  was  forced  to  abort  an  alternative  operation.  After  aborting,  the 
process  goes  to  sleep  for  some  predetermined  period  of  time  before  retrying.  While  blocked 
in  this  way,  the  process  is  in  the  Sleeping  state.  This  state  differs  from  the  Waiting  state 
because  a  process  may  remain  in  the  latter  for  an  unbounded  amount  of  time. 

•  RUNNING.  The  process  is  executing  user  or  system  code  not  related  to  the  alternative 
operation.  The  process  is  in  the  Running  state  if  it  is  not  in  any  of  the  other  states  listed 
above.  Once  the  process  initiates  an  alternative  operation,  it  can  only  be  in  the  Waiting. 
Alt,  or  Sleeping  state  until  the  alternative  operation  completes  with  a  rendezvous. 

It  is  possible  to  combine  the  Running  and  SLEEPING  states  into  a  single  state.  Two  states  are 
used  to  simplify  the  description  of  the  algorithm  and  its  proof. 

A  state  transition  diagram  for  each  process  is  shown  in  figure  1.  Initially,  a  process  is  in  the 
RUNNING  state.  Once  the  process  initiates  an  alternative  operation,  it  enters  the  ALT  state.  If 
the  process  is  forced  to  abort  the  alternative  it  switches  to  the  Sleeping  state,  and  returns  to  the 
ALT  state  when  it  retries.  If  the  process  is  able  to  commit  and  rendezvous  with  another  process. 


it  returns  to  the  Running  state.  Otherwise,  the  process  moves  to  the  Waiting  state  until  some 
other  process  commits  to  it,  at  which  time  it  rendezvous  and  returns  to  the  Running  state. 

The  ALT  and  SLEEPING  states  should  be  viewed  as  “transitory”  states  through  which  a  process 
must  pass  while  trying  to  commit  or  move  into  the  Waiting  state.  It  will  be  shown  that  a  process 
cannot  remain  in  either  the  Alt  or  the  Sleeping  state  for  an  unbounded  amount  of  time  on  a 
single  transaction. 

4.2  Shared  Variables 

Each  process  Pj  maintains  a  number  of  variables  that  may  be  examined,  and  in  some  cases  modified, 
by  other  processes: 

•  AltListj  lists  the  guards  associated  with  the  last  alternative  operation  initiated  by  P}  that 
caused  Pj  to  enter  the  Waiting  state. 

•  AltLockj  is  a  lock  used  to  control  access  to  AltListj.  It  is  initialized  to  0  (unlocked). 

•  Statej  holds  the  current  state  of  P}.  It  may  be  set  to  Waiting,  Alt,  Sleeping,  or  Running. 
and  is  initialized  to  Running. 

•  WakeUpj  is  initialized  to  1  and  is  set  to  zero  by  P:  whenever  it  enters  the  Waiting  state. 
It  is  incremented  (atomically)  by  processes  trying  to  commit  to  Pj.  This  variable  prevents 
two  processes  from  both  successfully  committing  to  a  third  on  a  single  transaction. 

There  is  also  one  system  wide  global  variable  used  by  the  algorithm: 

•  NextTransID  is  initialized  to  zero  and  is  incremented  each  time  a  process  initiates  an 
alternative  operation.  This  variable  ensures  a  unique  transaction  ID  can  be  generated  for 
each  instance  of  an  alternative  operation. 

One  procedure  merits  special  attention.  CheckAndCommit(AltListr,  gj):  INTEGER  is 
called  by  process  Pi  (/  denotes  the  local  process)  to  check  that  “valid”  communications  can  take 
place  between  Pi  using  guard  g ,  and  PT  (r  denotes  the  remote  process),  and  if  so,  to  attempt  to 
commit  to  PT.  If  a  commit  was  attempted  and  succeeded,  then  CheckAnd  Commit  returns  a  positive 
integer  indicating  the  corresponding  guard  in  the  remote  process  PT.  Otherwise,  Check AndCommit 
returns  a  non-po6itive  integer,  denoted  by  the  constant  Failed.  This  procedure  is  shown  in  figure  2 

CheckAndCommit  uses  a  procedure  CheckGuard(AltListr,  gj):  INTEGER  that  scans  the 
remote  alternative  list  AltListr  looking  for  a  matching  and  compatible  guard  to  the  local  guard 
g,.  By  matching  we  mean  g}  contains  an  I/O  operation  with  Pi.  By  compatible  we  mean  g ,  and  gj  do 


not  both  contain  input  (output)  commands.  CheckGuard  returns  j,  the  number  of  a  matching  and 
compatible  guard  if  one  was  found,  and  Failed  otherwise.  If  such  a  guard  is  found,  Pi  attempts  to 
commit  to  PT  by  testing  if  WakeUpr  is  zero,  and  if  so,  incrementing  it.  An  ordinary  addition  is  used 
rather  than  the  A tomicAdd primitive  to  increment  WakcUpT  because  AltLockr  guarantees  atomicity 
—  every  “test-and-set”  operation  performed  on  WakeUpT  occurs  while  AltLockr  is  set.  If  Pi  is  the 
first  process  to  commit  to  Pr ,  i.e.,  if  WakeUpT  was  previously  zero,  then  Pi  successfully  commits. 
CheckAndCommit  returns  the  number  of  the  corresponding  guard,  and  rendezvous  is  imminent 
Otherwise,  CheckAndCommit  returns  Failed.  AltLockr  ensures  serial  access  to  AltListr.  As  wi’.'. 
be  demonstrated  later,  it  is  crucial  that  this  lock  is  not  released  until  after  the  commit  operation 
is  attempted  (if  it  is  attempted)  in  order  to  avoid  race  conditions.  This  would  be  the  case  even 
an  A  tomicAdd  operation  were  used  to  increment  the  Wake  i'p  variable. 

4.3  Other  Notation 

For  notational  convenience,  other  variables  and  predefined  functions  are  defined  that  are  u-<-d 
the  algorithm.  These  include: 

•  TransIDj  is  a  variable  that  contains  the  ID  of  the  current  transaction  in  whirl,  pr.  !‘ 
engaged. 

•  CommunicantID(g|)  is  a  function  that  returns  the  ID  of  the  prore--  h-te,;  i?  ••• 
command  portion  of  guard  g, . 

•  Communicate(gj)  executes  the  I/O  command  in  guard  g, 

•  TimeOut  is  a  constant  indicating  the  number  of  time  units  a  pr* *«•>..  «■!. ■. 
aborted  attempt.  More  will  be  said  about  this  later 

4.4  Description  Of  The  Algorithm 

The  alternative  algorithm  is  shown  in  figures  3  and  4.  The  Alternative  prored  ur«-  show  j.  m  fit 
is  a  “front  end"  that  is  responsible  for  retrying  aborted  attempts  The  heart  of  the  algor.:  hr:.  .  • 
in  the  TryAltemative  procedure  shown  in  figure  4  The  parameters  passed  to  both  procedure-  an 
n  enabled  I/O  guards  g\ ,  g?.  ....  gn .  Each  guard  contains  either  a  single  output  or  a  singl*  m; 
primitive.  The  Alternative  procedure  is  only  called  after  non  I/O  guards  have  been  evaluated  ai  : 
are  found  to  be  False.  This  procedure  does  not  return  until  a  rendezvous  ha-  been  compb’-d  a’ 
which  time  it  returns  an  integer  indicating  the  guard  {g | .  g< . g..'  that  w a-  event  u a  1 1  \  sat:  - ‘ .  ■: 


The  Alternative  procedure  obtains  a  unique  transaction  ID  by  performing  an  Atomic  Add  opera¬ 
tion  on  the  global  NextTransID  variable.  It  then  attempts  to  rendezvous  by  calling  TryAlternative. 
TryAltemative  either  returns  the  number  of  the  guard  on  which  a  rendezvous  occurred,  or  the 
FAILED  flag  indicating  the  attempt  must  be  retried.  Each  time  TryAltemative  fails,  the  process 
enters  the  Sleeping  state  for  at  least  TimeOut  time  units  before  retrying.  The  same  transaction 
ID  remains  in  use  despite  one  of  more  failed  attempts.  It  will  be  shown  that  TryAltemative  cannot 
fail  an  unbounded  number  of  times  within  a  single  transaction. 

The  heart  of  the  alternative  algorithm  is  embodied  in  the  TryAlternative  procedure  (figure  4). 
In  this  procedure,  /  refers  to  the  local  process  P/,  and  r  refers  to  the  remote  process  Pr  associated 
with  the  guard  that  is  being  scanned. 

After  setting  the  state  of  the  process  to  Alt,  P;  examines  each  guard  listed  in  the  alternative 
operation  one  after  the  other.  Some  action  is  then  performed  depending  on  the  state  of  Pr. 

If  Pr  is  in  the  Running  state,  Pi  simply  advances  to  the  next  guard.  In  this  case,  Pr  has  not 
yet  entered  a  transaction  and  is  not  yet  ready  to  rendezvous. 

IfPr  is  in  the  Sleeping  state,  Pj  again  advances  to  the  next  guard.  P;  advances  because  the 
Alternative  procedure  guarantees  that  the  SLEEPING  process  (Pr)  will  eventually  retry  its  alternative 
operation.  If  P/  and  Pr  are  destined  to  eventually  rendezvous  on  this  transaction.  P/  will  typically 
proceed  to  the  Waiting  state,  and  Pr  will  later  retry,  commit,  and  rendezvous  with  Pt. 

If  Pr  is  Waiting,  then  Pr  has  already  reached  the  rendezvous  point  so  Pi  attempts  to  ren¬ 
dezvous.  AltListr  is  examined  to  make  sure  a  valid  communication  can  take  place,  and  if  so. 
Pi  attempts  to  commit.  If  successful,  Pi  will  awaken  Pr  (by  sending  a  signal)  and  rendezvous. 
Otherwise,  Pi  advances  to  the  next  guard. 

Finally,  if  Pr  is  in  the  Alt  state,  some  special  precautions  must  be  taken  to  avoid  race  condi¬ 
tions.  This  situation  could  result,  for  example,  when  P/  and  Pr  initiate  an  alternative  operation 
at  approximately  the  same  time.  The  two  processes  may  or  may  not  be  destined  to  rendezvous, 
however.  In  fact,  Pr’s  alternative  operation  may  not  even  contain  a  guard  with  P/  as  a  communicant . 

If  two  processes  see  each  other  in  the  Alt  state,  one  will  be  forced  to  abort  and  retry  the  alter¬ 
native,  while  the  other  pauses  within  the  current  operation  until  the  first  aborts.  The  transaction 
IDs  of  the  two  processes  are  used  to  determine  the  process  that  will  abort  and  the  process  that  will 
proceed.  A  process  with  a  smaller,  i.e.,  older,  transaction  ID  is  given  higher  priority.  This  protocol 
avoids  deadlock  situations  in  which  two  processes  attempting  to  communicate  with  each  other  both 
advance  to  the  Waiting  state. 

If  the  process  does  not  abort,  it  pauses  in  a  busy  wait  loop  until  the  remote  process  moves  out  of 
the  Alt  state.  The  remote  process  will  either  abort,  changing  to  the  Sleeping  state,  or  rendezvous. 


changing  to  the  Running  state.  Later,  it  will  be  shown  that  one  of  these  two  possibilities  must 
eventually  occur.  Although  the  busy  wait  loop  and  abort  retry  scenario  might  initially  appear  to 
cause  wasted  time  that  could  be  better  spent  pursuing  other  activities,  it  is  anticipated  that  this 
situation  will  arise  infrequently  in  practice.  Performance  evaluations  using  empirical  techniques  are 
currently  in  progress  to  verify  that  this  is  the  case. 

It  is  interesting  to  note  that  the  6tate  of  PT  may  change  immediately  after  Pi  examines  Siatcr. 
It  will  be  proven  that  the  algorithm  operates  correctly  despite  this  potential  inconsistency.  In 
fact,  it  will  be  6hown  that  the  only  locking  that  must  be  performed  in  the  entire  algorithm  is  that 
associated  with  AltLock. 

If  Pi  goes  through  its  entire  guard  list  without  rendezvousing  with  another  process,  Pi  enters  the 
Waiting  6tate  and  calls  WaitForSignal  to  block  until  another  process  commits  to  it.  Before  calling 
WaitForSignal,  however,  Pi  also  sets  AltListi  to  contain  the  current  guard  list  and  “activates" 
WakeUp,  by  setting  it  to  zero.  After  some  process  later  commits  to  Pt ,  a  signal  is  received, 
a  communication  takes  place,  and  TryAltemative  returns  the  identity  of  the  (local)  guard  that 
rendezvoused.  This  information  is  6ent  to  Pi  in  the  signal  that  awakened  it. 

We  should  emphasize  at  this  point  that  it  is  crucial  that  the  operations  listed  in  figures  2.  3. 
and  4  be  performed  in  exactly  the  order  in  which  they  appear.  Seemingly  minor  changes  such  as 
swapping  the  order  of  the  statements 

WakeUpi  :=  0; 

Statei  :=  WAITING; 

introduces  a  race  condition  that  invalidates  the  correctness  proof. 

We  note  that  the  Lock  operation  preceding  the  statement  that  modifies  AltList  must  remain 
even  if  modification  can  be  done  atomically.  The  locking  protocol  in  this  and  the  CheckAndCommU 
procedure  are  carefully  designed  to  avoid  race  conditions.  Finally,  it  is  noteworthy  that  the  state 
ment  that  6ets  WakeUpt  to  zero  need  not  be  executed  while  AltLocki  is  locked.  The  correctness 
proof  only  requires  that  two  processes  do  not  both  read  a  zero  value  from  WakeVpt  during  a  single 
transaction  of  Pi .  This  is  guaranteed  by  the  locking  protocol  used  in  Check AndCommit . 

5  Discussion 

Several  aspects  of  the  alternative  algorithm  presented  above  merit  further  discussion.  These  are 
discussed  next. 


5.1  Transaction  IDs 

The  algorithm  uses  dynamically  assigned  transaction  IDs  to  determine  the  “winner”  when  a  process 
finds  another  in  the  Alt  state.  Dynamic  IDs  are  used  rather  than  static,  process  IDs  to  ensure 
liveness.  Intuitively,  liveness  means  that  two  processes  that  “should”  rendezvous  eventually  will, 
while  safety  means  that  any  rendezvous  that  occurs  is  valid.  The  proposed  approach  avoids  scenarios 
in  which  a  process  is  repeatedly  forced  to  abort  and  retry  its  alternative  operation  an  unbounded 
number  of  times;  this  is  because  the  priority  of  a  transaction  automatically  increases  with  time  as 
other  transactions  are  allowed  to  complete  and  new  ones,  with  higher  IDs  and  correspondingly  lower 
priorities,  are  initiated.  Dynamic  transaction  IDs  guarantee  this  property  while  static  IDs  do  not. 
It  is  important  that  a  new  transaction  ID  is  only  allocated  when  an  alternative  is  first  initiated,  as 
is  done  in  figure  3,  and  not  when  an  existing  operation  is  retried.  The  use  of  dynamic  transaction 
IDs  is  further  justified  by  the  fact  that  global  variables  are  relatively  inexpensive  in  shared  memory 
architectures,  and  the  NextTransID  variable  is  not  referenced  with  sufficient  frequency  to  become 
a  hot  spot. 

A  second  concern  is  overflow  of  the  NextTransID  variable.  Overflow  invalidates  the  liveness 
property  of  the  algorithm  because  a  transaction’s  priority  does  not  necessarily  increase  with  time. 
Also,  because  transaction  IDs  cannot  be  guaranteed  to  be  unique  after  overflow  has  occurred,  the 
arbitration  protocol  could  fail  (this  could  be  circumvented  by  appending  the  process  ID  to  the  least 
significant  portion  of  the  transaction  ID,  however).  In  any  event,  overflow  can  be  easily  avoided 
bv  using  a  variable  of  large  precision.  For  example,  a  64  bit  variable  will  not  overflow  with  1000 
processes,  each  initiating  a  new  alternative  construct  every  microsecond,  in  over  500  years! 

5.2  The  Timing  Assumption 

We  earlier  required  the  following  assumption  to  ensure  liveness: 

The  amount  of  time  between  successive  samples  of  a  shared  memory  location  by  a  busy 
wait  loop  (which  does  nothing  but  sample  and  test  the  value  stored  in  this  location  for 
inequality)  can  be  bounded,  and  is  shorter  than  the  time  required  to  invoke  either  the 
Send  or  Reev  primitives. 

This  assumption  is  necessary  because  the  algorithm  uses  a  polling  loop  to  detect  another  process 
leaving  the  Alt  state.  Suppose  P,  is  waiting  for  P}  to  change  to  a  new  state.  It  is  possible,  albeit 
unlikely,  that  P}  (1)  modifies  State },  (21  rendezvous  and  resumes  execution  of  user  code  or  goes 
to  sleep  for  TimcOut  units  of  time,  and  (3)  reenters  TryAlternative  and  changes  Stalej  back  to 
ALT;  all  of  this  must  occur  without  P,  noticing  5(a/t;  had  been  modified,  so  this  activity  must 
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occur  between  successive  samples  of  State }  by  P,’s  polling  loop.  While  it  is  true  that  this  might 
occasionally  occur  if  Pi  is  interrupted  during  its  polling  loop,  it  is  necessary  that  this  scenario  be 
repeated  an  unbounded  number  of  times  within  a  single  execution  of  the  polling  loop  to  compromise 
the  liveness  of  the  algorithm.  We  conjecture  that  it  is  highly  improbable  that  such  a  scenario  will 
occur  even  a  few  times  within  a  single  transaction.  Further,  we  emphasize  that  safety  remains 
guaranteed  even  if  the  above  assumption  is  relaxed,  so  no  ill  effects,  other  than  delays,  will  result 
should  this  scenario  occur  some  (finite)  number  of  times. 

As  can  be  seen  from  figure  4,  P}  must  execute  either  the  Sleep,  Send ,  or  Recv  primitive  after  the 
state  of  P3  is  changed  (to  Sleeping  or  Running),  i.e.,  during  step  (2)  above.  Therefore,  as  stated 
in  the  above  assumption,  ensuring  that  the  minimum  execution  time  of  each  of  these  primitives 
exceeds  the  time  between  successive  samples  of  P,’s  polling  loop  is  sufficient  to  avoid  the  above 
scenario  (actually,  the  Sleep  primitive  is  excluded  because  its  minimum  execution  time  is  trivially 
set).  If  the  time  between  successive  samples  of  the  polling  loop  can  be  bounded,  the  minimum 
amount  of  time  required  by  the  Send  and  Recv  primitives  can  be  easily  modified  to  adhere  to  the 
timing  assumption  through  the  introduction  of  a  timed  delay  (e.g.,  by  calling  Sleep).  However,  one 
would  not  expect  introduction  of  such  a  delay  to  be  necessary  in  meet  practical  situations. 

Assuming  the  time  required  by  a  remote  memory  reference  is  bounded,  the  time  between  suc¬ 
cessive  samples  by  the  busy  wait  loop  can  be  bounded  by  disabling  interrupts  during  the  polling 
loop.  If  this  is  not  a  viable  alternative,  one  can  reduce  the  likelihood  of  entering  the  above  scenario 
by  introducing  randomness  into  the  program’s  temporal  behavior.  For  example,  a  random  sleeping 
period  may  be  selected  (with  some  minimum  value,  as  described  below)  when  a  process  is  forced  to 
abort.  This  will  reduce  the  likelihood  of  excessive  delays  caused  by  synchronized  behavior  between 
processes. 

5.3  Setting  the  Sleeping  Period 

The  “sleep  period”  before  a  retry  is  attempted,  i.e.,  TimeOut  in  figure  4,  must  be  sufficiently  long 
to  allow  the  “winning”  process  to  observe  that  the  sleeping  process  is  indeed  in  the  Sleeping  state. 
In  particular,  TimeOut  cannot  be  shorter  than  the  interval  between  successive  samples  in  the  busy 
wait  loop  executed  by  the  winner. 

On  the  other  hand,  an  excessively  long  sleeping  period  will  lead  to  an  inefficient  implementation. 
A  reasonable  TimeOut  value  is  the  time  required  for  a  few  remote  memory  references. 


5.4  Channel  I/O 

In  many  CSP  implementations,  interprocess  communication  i6  based  on  pre- allocated  channels. 
Each  channel  is  a  unilateral  link  between  two  communicating  processes.  The  channel  model  fa¬ 
cilitates  modularity,  reusability,  and  hierarchical  construction  of  programs  because  a  program  can 
be  “constructed”  by  interconnecting  a  group  of  constituent  processes.  The  algorithm  presented 
above  can  be  adapted  to  the  channel  I/O  model  by  modifying  the  Send  and  Recv  primitives  and 
translating  port  identifiers  to  process  IDs.  The  CheckAndCommit  procedure,  for  instance,  must  be 
modified  to  check  for  matching  channels  rather  than  matching  process  IDs.  These  modifications 
are  a  simple  extension  of  the  proposed  algorithm. 

5.5  Termination 
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Termination  is  another  important  issue  facing  real  implementations.  This  was  not  treated  in  the 
previous  discussion  because  it  introduces  obscurities  into  the  description.  The  termination  se¬ 
mantics  play  an  important  role  in  CSP  because  it  is  the  basis  of  the  termination  of  the  repetitive 
command  [11].  If  an  alternative  operation  is  embedded  within  a  repetitive  command  and  no  guard 
of  the  alternative  can  become  true,  e.g.,  because  all  processes  associated  with  enabled  guards  have 
terminated,  the  repetitive  command  terminates.  If  no  such  repetitive  command  surrounds  the 
alternative  operation  and  it  is  found  that  no  guards  can  become  true,  an  error  results. 

In  the  context  of  the  proposed  algorithm,  it  is  sufficient  that  the  Alternative  procedure  determine 
when  no  guards  can  become  satisfied  and  return  an  appropriate  flag  denoting  this  situation.  The 
algorithm  can  be  extended  to  handle  termination  by  adding  a  shared  variable  called  GuardCount, 
to  each  process  Pi  and  a  new  process  state  called  Terminated.  GuardCounti  indicates  the  number 
of  I/O  guards  on  which  P,  might  potentially  rendezvous  for  the  current  transaction  and  contains 
a  meaningful  value  whenever  P,  is  in  the  Waiting  state.  It  is  equivalent  to  the  number  of  guards 
in  AltListi.  The  GuardCounti  variable  is  used  to  detect  situations  in  which  P ,  cannot  rendezvous 
because  all  of  the  processes  in  its  guards  have  terminated.  This  is  the  only  case  in  which  the 
Alternative  procedure  will  return  without  rendezvous. 

Whenever  a  process  Pj  terminates,  it  marks  its  state  as  Terminated  and  then  examines  the 
state  of  each  of  its  neighboring  processes,  i.e.,  those  processes  which  might  communicate  with  Py 
If  Pj  finds  another  process  Pi  in  the  ALT  state,  it  executes  a  busy  wait  loop  until  State ,  changes. 
This  is  necessary  because  P}  cannot  know  if  P ,  saw  Pj  had  entered  the  Terminated  state.  If  Pj 
finds  P ,  in  the  Waiting  state  and  AltListt  contains  a  guard  listing  P:  as  a  communicant,  then  P} 
(atomically)  decrements  GuardCount,  to  indicate  that  one  fewer  guard  is  available  for  rendezvous. 
No  further  action  is  required  unless  the  decrement  operation  causes  GuardCount,  to  become  zero. 

13 


ft?. -JR 


3 


V 

."N 

-■N 

.'I 


►1 

ti 

i 

i 

G 

3 


In  this  case,  the  terminating  process  must  send  Pi  a  special  signal  to  indicate  P,' s  alternative 
operation  can  never  rendezvous.  Upon  receiving  this  signal,  the  alternative  operation  in  Pt  will 
return  a  special  flag  indicating  the  alternative  operation  completed  without  rendezvous. 

When  looking  for  a  process  with  which  to  rendezvous,  i.e.,  when  scanning  the  status  of  neighbor¬ 
ing  processes  in  the  TryAltemative  procedure,  an  I/O  guard  corresponding  to  a  terminated  process 
is  skipped  in  the  same  way  processes  in  the  Running  or  Sleeping  state  are  skipped.  Such  guards 
are  excluded  from  AltListi  and  GuardCounti  should  the  process  fail  to  rendezvous  and  move  into 
the  Waiting  state.  If  all  I/O  guards  correspond  to  terminated  processes,  the  alternative  construct 
again  returns  a  flag  indicating  the  operation  completed  without  rendezvous. 

Finally,  some  precautions  must  be  taken  to  avoid  race  conditions.  The  mechanism  described 
above  to  notify  a  Waiting  process  that  it  cannot  rendezvous  on  any  of  its  guards  bears  some 
resemblance  to  the  protocol  used  to  commit  to  a  process  —  the  WakeVp  variable  is  analogous  to 
GuardCount  and  committing  (by  incrementing  Wake  Up)  is  analogous  to  decrementing  GuardCount. 
Therefore,  it  is  not  surprising  that  the  precautions  that  are  necessary  to  avoid  race  conditions  arc 
similar.  In  particular,  GuardCount,  must  be  set  before  P,  sets  State,  to  Waiting  but  after  P, 
modifies  AltListi  (see  figure  4).  Identical  constraints  apply  regarding  the  moment  at  which  Wake  Uj> 
to  set  to  zero.  Finally,  when  P3  wishes  to  decrement  GuardCount,,  the  same  protocol  that  was 
used  in  the  CheckAndCommit  procedure  (see  figure  2)  to  lock  AltLock,  must  be  used  to  decrement 
GuardCounti,  i.e.,  AltLock ,  must  not  be  released  until  after  the  decrement  operation  has  completed. 

6  Proof  of  Correctness 

The  correctness  of  the  algorithm  is  established  by  proving  that  during  the  (potentially)  infinite 
execution  sequence,  all  processes  and  the  interplay  between  them  maintain  invariant  properties 
known  as  safety  and  liveness  [14,17],  As  described  above,  safety  means  that  any  rendezvous  which 
occurs  is  correct.  For  example,  it  is  not  possible  for  two  processes  to  rendezvous  which  do  not  each 
list  the  other  in  some  guard  of  their  respective  alternative  lists.  Liveness  ensures  that  two  processes 
which  should  rendezvous  eventually  will,  provided  of  course  each  does  not  first  rendezvous  with 
some  other  process.  These  terms  are  defined  more  formally  in  theorems  2  and  3.  Intuitively,  the 
safety  property  ensures  that  nothing  “bad"  will  happen,  while  liveness  ensures  something  “good” 
will  eventually  happen.  Together  they  guarantee  correct  operation  of  the  algorithm. 

Before  beginning  the  proof,  terminology  that  has  been  used  informally  until  now  will  be  defined 
more  precisely.  These  definitions  are  in  terms  of  the  alternative  algorithm  shown  in  figures  2.  3. 
and  4.  It  is  assumed  throughout  that  the  CSP  program  consists  of  a  collection  of  processes.  P\ . 
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6.1  Definitions 

1.  A  process  P,  is  said  to  enter  a  transaction  7V  when  P,  calls  the  Alternative  function.  It  exits 
transaction  Tr  when  it  returns  from  the  function  call.  P,(7V)  denotes  that  fact  that  P,  is  in 
Tr.  Each  transaction  has  a  unique  ID  associated  with  it  (r  for  transaction  Tr)  that  is  used  to 
form  a  total  ordering  among  all  transactions.  A  transaction  need  not  terminate.  For  example, 
the  application  program  may  deadlock. 

2.  A  process  Pi  in  transaction  Tr  is  said  to  commit  to  process  P:  if  Pi(Tr)  increments  Wake  Up: 
from  zero  to  one.  The  algorithm  is  such  that  every  time  WakeVp}  is  incremented,  a  commit 
operation  takes  place. 

3.  A  transaction  Tr  executed  by  process  P,  is  said  to  rendezvous  with  transaction  T,  for  process 
Pj  if  either  (a)  Pi  is  in  the  Waiting  state  and  receives  a  signal  from  P,,  or  (b)  P,  signals  Pj 
after  committing  to  Pj.  It  will  be  shown  that  once  a  process  rendezvous,  it  will  exchange  a 
message,  complete  the  current  transaction  and  return  to  the  Running  state. 

4.  A  signal  sent  by  P,  to  P}  is  said  to  be  pending  if  (1)  it  was  sent  but  has  not  yet  been 
received  by  P},  or  (2)  it  was  received,  but  has  not^yet  been  absorbed  by  P,  through  a  call  to 
WaitForSignal. 

5.  A  communication  between  P,  and  P}  is  compatible  if  one  process  wishes  to  send,  and  the 
other  wishes  to  receive.  Otherwise,  the  communication  is  said  to  be  incompatible. 

6.  VAR,(Tr)  denotes  the  value  of  state  variable  VAR  of  process  P,  during  transaction  Tr.  For 
example,  AltListi(Tr)  is  the  alternative  list  of  P,  during  transaction  Tr.  If  significant,  the 
point  in  time  during  the  transaction  that  is  referred  to  will  be  stated  explicitly. 

7.  The  function  prev(Tr)  returns  the  ID  of  the  transaction  executed  by  the  process  which 
immediately  preceded  Tr.  The  existence  of  Tr  implies  the  termination  of  prev(7V).  Also. 
prev°(Tr)  refers  to  Tr  itself  and  prevm(Tr)  corresponds  to  the  mth  previous  transaction 
entered  by  Pj. 

8.  GuardListj(Tr)  lists  the  guards  that  are  passed  as  parameters  to  the  alternative  operation 
executed  by  Pj  on  transaction  7V.  We  will  take  the  liberty  of  giving  GuardList  a  dual  meaning 
—  it  either  refers  to  a  list  of  guards  or  a  list  of  processes  that  are  designated  in  the  I/O 
commands  of  these  guards.  The  particular  meaning  that  is  intended  will  be  clear  from  the 
context. 
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6.2  The  Safety  Property 


Lemmas  1  through  5  lead  to  theorem  1  which  states  that  no  race  conditions  arise  that  might  cause 
a  process  to  mistakenly  rendezvous  with  a  second  process  that  does  not  wish  to  rendezvous  with 
the  first.  Theorem  2  subsumes  theorem  1  and  ensures  that  the  algorithm  obeys  the  safety  property. 

Lemma  1  Pi{Tr)  signals  P3  iff  Pi(Tr)  commits  to  P3. 

Proof:  This  follows  immediately  from  examination  of  the  algorithm.  A  process  only 
sends  a  signal  after  it  commits,  and  always  sends  a  signal  after  it  commits.  I 

This  lemma  implies  that  WakeUp }  must  be  set  to  0  before  a  signal  can  be  sent  to  Pv  In 
addition,  at  most  one  signal  is  sent  .)  P3  each  time  WakeUp  ^  is  set  to  0. 

Lemma  2  At  the  beginning  and  at  the  end  of  each  transaction  entered  by  P3,  the  following  condi¬ 
tions  must  hold: 

(a)  A 'o  signals  sent  to  P:  are  pending. 

(b)  WakeUpj  is  nonzero. 

Proof:  Use  induction  on  m,  the  number  of  transactions  entered  by  P3. 

Consider  the  first  transaction  (m  =  1)  executed  by  P3.  WakeUp  3  is  initialized  to  1. 
Because  WakeUp^  can  only  be  set  to  0  by  P3  during  a  transaction,  WakeUp:  must 
remain  nonzero  up  to  at  least  the  beginning  of  P3 s  first  alternative  operation.  No 
process  can  commit  to  P3  until  WakeUp}  becomes  0,  so  by  lemma  1,  no  signals  can  be 
sent  to  P3  before  its  first  transaction,  and  therefore  none  can  be  pending.  Thus,  (a) 
and  (b)  are  both  true  at  the  beginning  of  P3's  first  transaction. 

During  any  transaction,  and  in  particular  the  first,  P3  will  either  reset  WakeUp }  to  0 
exactly  once  (just  before  entering  the  Waiting  state),  or  not  at  all.  If  P3  does  not 
reset  WakeUp3,  then  obviously  WakeUp}  is  still  nonzero  at  the  end  of  the  alternative 
operation.  No  signal  can  be  sent  to  P3  because  no  process  can  commit,  so  none  are 
pending. 

If  P3  does  reset  WakeUp }  to  0,  then  at  most  one  process  can  commit  (and  send  a  signal) 
to  P3  during  this  transaction.  This  is  because  (1)  WakeUp 3  is  set  to  0  at  most  one  time 
during  this  transaction;  (2)  each  process  must  obtain  the  lock  AllLock3  before  it  can 
examine  WakeUp }  (see  the  Check AndCommit  procedure);  (3)  as  soon  as  one  process 
reads  a  zero  in  WakeUp3,  it  increments  it  before  releasing  AltLock3\  so  (4)  two  processes 
cannot  both  read  a  zero  value  from  W’afcc  Up}  during  a  single  transaction  in  P3.  Because 


no  two  processes  can  see  a  zero  value  in  WakeUpj  during  a  single  transaction,  no  two 
processes  can  commit  to  Pj  during  this  (or  any)  transaction.  Therefore,  according  to 
lemma  1,  at  most  one  signal  will  be  sent  to  Pj  during  this  transaction. 

Pj  always  calls  WaitForSignal  after  setting  WakeUpj  to  zero.  Therefore,  the  only  signal 
that  could  have  been  sent  to  Pj  must  have  been  absorbed  by  the  WaitForSignal  opera¬ 
tion,  so  none  can  be  pending  when  the  transaction  completes  (if  it  completes)  satisfying 
condition  (a).  Condition  (b)  must  also  be  satisfied  at  the  end  of  the  transaction  because 
a  process  must  commit  before  sending  a  signal  to  Pj ,  so  WakeUp}  must  be  nonzero  be¬ 
fore  the  process  can  resume  execution  after  calling  WaitForSignal.  Therefore,  (a)  and 
(b)  are  again  true  at  the  end  of  the  first  alternative  operation  as  well  as  at  the  beginning. 

Inductive  step:  Assume  lemma  2  is  true  on  the  mth  transaction  entered  by  Py  We 
will  now  show  it  is  also  true  on  the  m  +  1st  transaction.  According  to  the  inductive 
hypothesis,  no  signals  are  pending  at  the  end  of  the  mth  operation,  and  WakeUpj  is 
nonzero.  Therefore,  these  conditions  will  remain  true  until  the  beginning  of  the  m  +  1st 
transaction  because  no  process  can  commit  to  P}  until  WakeUpj  becomes  0.  As  noted 
in  the  proof  for  m  =  1,  if  (a)  and  (b)  are  true  at  the  beginning  of  any  transaction,  they 
will  be  true  at  the  end  of  the  transaction  if  it  terminates.  Therefore,  (a)  and  (b)  are 
true  at  the  end  of  the  m  +  1st  transaction  entered  by  Pj.  I 

Lemma  3  Two  processes,  P,  and  Pj,  cannot  both  commit  to  a  third  process  Pk  during  a  singk 
transaction  Tt  entered  by  Pk- 

This  lemma  was  actually  proven  as  part  of  the  proof  of  lemma  2,  but  we  include  it  as  a  separate 
lemma  for  future  reference.  The  proof  relies  on  the  fact  that  WakeUpk  is  not  zero  at  the  beginning 
of  the  alternative  operation  and  can  be  set  to  zero  at  most  one  time  during  a  single  transaction.  The 
atomicity  of  the  commit  operation  (i.e.,  two  read-modify- write  sequences  cannot  be  inappropriately 
interleaved)  guarantees  that  only  a  single  process  can  commit  to  Pk  during  Tt. 

Lemma  4  If  Pi(Tr)  commits  to  Pj,  then  Pj  must  have  been  in  the  Waiting  state  when  P,  com¬ 
mitted  to  Pj,  and  P}  must  remain  in  the  Waiting  state  until  Pj  receives  the  signal  sent  by  P that 
results  from  this  commitment. 

Proof:  According  to  the  algorithm,  Pi  checks  that  P}  is  in  the  Waiting  state  before 
trying  to  commit  to  Pj.  Let  us  assume  P}  is  in  transaction  T,  when  P,  sees  in  the 
Waiting  state.  Therefore,  it  only  remains  to  be  shown  that  P:  is  still  in  the  Waiting 
state  when  Pi  commits,  as  well  as  when  the  signal  is  received.  This  must  be  the  case. 


however,  because  once  Pj  enters  the  Waiting  state,  it  cannot  change  state  until  it  first 
receives  a  signal.  By  lemma  2a,  there  were  no  signals  pending  when  transaction  T, 
began.  By  lemma  3  no  process  other  than  P,  will  commit  to  P}  during  this  transaction, 
so  no  signal  other  than  P,'s  are  sent  to,  or  received  by  Pj  during  this  transaction. 
Therefore,  Pj  cannot  unblock  from  the  WaitForSignal  operation  and  therefore  cannot 
change  state  until  receiving  the  signal  sent  by  P,.  I 

The  preceding  lemma  shows  that  arbitrarily  long  delays  may  occur  from  the  time  P,  observes 
that  Pj  is  in  the  Waiting  state  until  Pf s  signal  actually  arrives  at  P3.  If  the  commit  succeeded, 
this  lemma  guarantees  that  nothing  “interesting”  will  happen  at  P3  from  the  time  P,  found  it  to 
be  waiting  until  the  signal  was  received. 

Lemma  5  No  signals  are  lost  in  the  alternative  algorithm. 

Proof:  By  lemma  2a,  no  signals  are  pending  at  the  beginning  of  each  transaction.  By 
lemma  3,  at  most  one  process  can  commit  during  a  transaction,  so  at  most  one  signal 
is  sent  (and  therefore  received)  during  a  transaction.  Thus,  a  signal  can  never  arrive 
during  a  transaction  while  another  has  already  been  received  but  is  still  pending,  so  no 
signals  are  ever  lost  during  a  transaction. 

No  signals  destined  for  a  process  P3  are  lost  between  successive  transactions  of  P3  be¬ 
cause  none  can  be  sent  to  P3  while  it  is  in  the  Running  state.  This  is  true  because  (1) 
a  signal  is  only  sent  to  P3  following  a  commit  operation  (lemma  1),  (2)  Pj  must  have 
been  in  the  Waiting  state  when  the  commit  occurred  (lemma  4),  and  (3)  P}  must  re¬ 
main  in  the  Waiting  state  until  the  signal  is  received  and  absorbed  by  a  WaitForSignal 
operation  (lemma  4).  I 

Theorem  1  If  Pi{Tr)  signals  (rendezvous)  P},  then  P3  must  be  in  some  transaction  T,  both  when 
the  signal  is  sent  and  when  it  is  received.  Further,  Pj{T,)  rendezvous  P,{Tr). 

Proof:  By  lemma  4,  Pj  must  be  in  a  transaction  when  the  signal  is  sent  and  when  it  is 
received,  and  remain  in  the  Waiting  state  during  this  period.  By  lemma  5,  P,' s  signal 
cannot  be  lo6t.  By  lemmas  1,  2a  and  3,  this  is  the  only  signal  received  by  P3  during 
transaction  T,,  eliminating  the  possibility  of  P3  accepting  another  signal  instead  of  P,' s. 
Because  Pj  always  executes  WaitForSignal  when  in  the  Waiting  state,  the  signal  from 
Pi  must  be  received,  implying  P3  rendezvous  with  P,.  I 

Theorem  2  (Safety)  If  Px{Tr)  commits  to  P3(T,),  then  the  following  properties  must  be  true 
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1.  (Mutual  consent )  P,(Tr)  rendezvous  P:(Tt)  and  P:(Tt)  rendezvous  Pt(Tr).  In  other  words, 
the  two  communicating  parties  agree  each  is  rendezvousing  with  the  other. 

2.  Pj  G  GuardListi(Tr)  and  Pi  G  GuardList}(T,). 

S.  Communications  between  Pi(Tr)  and  Pj(Tt)  are  compatible. 

4 ■  Pi  and  Pj  will  eventually  communicate,  complete  their  transaction,  and  return  to  the  Running 
state. 


5.  There  does  not  exist  a  third  process  P*  (k  ^  t  and  k  ^  j)  such  that  Pk{Tt)  rendezvous  with 
P,(Tr)  or  Pk(Tt)  rendezvous  with  Pj(T,). 

Proof: 

1.  P,{Tr)  commits  to  Pj(T,),  implying  Pi(Tr)  signals  Pj{T,)  (lemma  1).  This  in  turn 
implies  the  mutual  rendezvous  according  to  theorem  1. 

2.  The  first  part,  showing  P}  G  GuardList,(Tr),  can  be  proved  by  contradiction. 
Suppose  Pj  £  GuardList,(Tr).  Then  P,  would  not  have  committed  to  P}  be¬ 
cause  P,  only  scans  those  processes  in  GuardList,(Tr)  (see  the  FOR  loop  in  the 
TryAltemative  procedure),  contradicting  our  original  assumption  that  P,  commit¬ 
ted  tO  Pj. 

It  only  remains  to  be  proven  that  P,  G  GuardListj(T,).  It  is  seen  from  the  al¬ 
gorithm  that  P,  checks  AltList}  just  before  committing  to  Pj,  and  AltList:  is 
set  to  hold  GuardListj(T,)  just  before  P}  enters  the  Waiting  state,  and  there¬ 
fore  before  the  commit.  However,  an  arbitrarily  long  delay  may  elapse  from  the 
time  P,  checked  AltList,  to  the  time  it  committed.  We  therefore  need  to  con¬ 
firm  that  the  value  of  Alt  List ;  that  P,  checked  is  Guard  List ,(T,)  rather  than 
GuardListj(prevm(T,))  for  some  m  >  0. 

This  will  be  proven  by  contradiction.  Suppcee  P,  checked  GuardList3(prev(T,)). 

This  would  imply  that  the  following  sequence  of  events  must  have  occurred: 

(a)  P%{Tr)  checks  GuardList}(prev(Tt))  (stored  in  AltList})\ 

(b)  Pj(Tt)  modifies  AltList}  so  that  it  becomes  GuardList}(T,); 

(c)  Pj(T,)  sets  WakeUpj(T,)  to  0;  and 

(d)  P,(Tr)  commits  to  P3(Tt). 

Event  (a)  must  take  place  by  the  aforementioned  assumption,  and  event  (d)  must 
take  place  by  our  original  assumption  that  P,(Tr)  commits  P:(TS).  Event  (c)  must 
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precede  (d)  because  WakeUpj(Ts)  must  be  reset  to  0  before  any  commitment  to 
Pj(T,)  can  occur  (see  definition  of  commit).  Event  (b)  must  precede  (c)  according 
to  the  order  in  which  operations  are  performed  in  the  algorithm.  Event  (b)  must 
follow  (a)  in  order  to  satisfy  our  supposition  that  Pi  checked  GuardListj(prev(T,)). 
However,  this  sequence  of  events  is  not  possible  because  the  locking  protocol  of 
the  procedure  CheckAndCommit  (used  by  Pi  when  checking  AltListj)  ensures  that 
AltListj  is  not  modified  after  P,  checks  it  (event  (a)  above),  but  before  P,  commits 
(event  (d)).  Therefore,  event  (b)  could  not  have  occurred  between  (a)  and  (d),  so 
our  assumption  that  Pi(Tr)  examined  GuardListj(prev(T,))  must  be  incorrect. 
Similarly,  it  is  not  possible  that  P,(Tr)  examined  GuardListj(prevm(Ta))  for  any 
m  >  0. 

3.  Compatibility  is  checked  when  P,(Tr )  checks  that  it  is  in  AltListj{Ta).  Similarly, 
this  information  is  implicitly  updated  whenever  AltListj  is  updated.  Therefore, 
this  condition  is  satisfied  using  the  same  proof  as  was  used  in  (2)  to  show  P,  is  in 
GuardListj(Ts). 

4.  Once  rendezvous  occurs  between  Pi(Tr)  and  Pj(T,),  each  process  initiates  a  com¬ 
munication  with  the  other.  Properties  (2)  and  (3)  above  and  the  reliability  assump¬ 
tion  regarding  the  communication  mechanism  guarantee  that  the  communication 
succeeds.  Once  this  occurs,  completion  of  the  alternative  operation  immediately 
follows. 

5.  Suppose  Pk(Tt)  rendezvoused  with  either  P,(Tr)  or  Pj(T,).  Recall  a  rendezvous 
occurs  by  either  sending  or  receiving  a  signal  to  or  from  another  process  (definition 
of  rendezvous),  so  there  are  four  possibilities: 

(a)  Pk(Tt)  received  a  signal  from  Pi(Tr)\ 

(b)  Pk{Tt)  received  a  signal  from  Pj{T,)\ 

(c)  Pk{Tt)  sent  a  signal  to  Pi{Tr);  or 

(d)  Pk(Tt)  sent  a  signal  to  Pj{T,). 

We  need  not  consider  signals  sent  before  Tr,  T,,  or  Tt  but  received  during  these 
respective  transactions  because  none  can  be  pending  when  the  transaction  begins 
(lemma  2a). 

(a)  Suppose  P*(7))  rendezvoused  because  it  received  a  signal  from  P,  during  Tr 
(signals  generated  by  P,  outside  Tr  are  not  relevant).  This  implies  P,(Tr)  sent 
signals  to  two  processes  because  our  original  assumption  is  that  P,(Tr)  committed 


to  (and  therefore  signaled  according  to  lemma  1)  P}(T,).  It  is  clear  from  the  algo¬ 
rithm  that  a  process  can  signal  at  mo6t  one  other  process  on  any  given  transaction 
because  any  time  a  signal  is  generated,  the  transaction  always  completes  without 
calling  the  Signal  procedure  again  (6ee  figure  4).  Therefore,  Pk(Tt)  could  not  have 
received  a  signal  from  Pi(Tr). 

(b)  Suppose  Pk(Tt)  received  a  signal  from  P}  during  T,  (signals  generated  by  P} 
outside  T,  are  not  relevant).  This  implies  P}{Tt)  both  sent  a  signal  to  Pk  and 
received  a  signal  from  Pi  within  a  single  transaction.  If  P}(T,)  sent  a  signal,  then, 
according  to  the  algorithm  in  figure  4,  P}  must  have  rendezvoused  and  completed 
the  transaction  without  ever  entering  the  Waiting  state  or  setting  WakcVp}(Ts) 
to  zero.  This  contradicts  our  original  assumption  that  P,(Tt)  committed  to  P3(TS). 

(c)  Suppose  Pk{Tt)  signaled  P,(Tr)-  This  implies  P,(Tr)  both  sent  a  signal  to  Pj 
and  received  a  signal  from  Pk  within  a  single  transaction.  This  latter  signal  must 
have  been  preceded  by  Pk(Tt)  committing  to  P,  (lemma  1).  This  commit  must 
have  occurred  during  or  before  Tr.  But,  Pk{Tt)  could  not  have  committed  to  P, 
during  Tr  because  Wake  Up,  is  never  equal  to  zero  during  Tr.  This  is  because,  by 
assumption,  P,(Tr)  commits  to  P:(T,).  so  P,{Tr)  never  enters  the  Waiting  state 
(It  is  only  then  that  the  WakeUp  variable  is  set  to  0.)  Also.  Pk{Tt)  could  not  have 
committed  to  Pt  before  Tr  and  signaled  P,  during  Tr  because  this  would  violate 
lemma  4.  Therefore  Pk(Tt)  could  not  have  sent  a  signal  to  P,(Tr). 

(d)  Finally,  Pk(Tt)  could  not  have  committed  (and  therefore  could  not  have  sig 
naled)  P}  during  T,  because  this  would  imply  both  Pk  and  P,  committed  to  P: 
within  a  single  transaction,  violating  lemma  3.  Pk(Tt)  could  not  have  committed  to 
Pj  before  T,  and  signaled  P}  during  T,  because  this  would  again  violate  lemma  -4. 

Thus,  Pk(Tt)  could  not  have  signaled  P}(T,)  either.  Therefore,  P*.(7,)  could  not 
have  rendezvoused  with  either  Pi(Tr)  or  Pj{T,),  so  the  proof  is  complete.  I 

Note  from  the  proof  of  (2)  in  the  Safety  theorem  that  it  is  crucial  that  accesses  to  AltList  are 
controlled  by  locks,  and  that  the  act  of  checking  the  AltList  and  committing  is  atomic  to  ensure 
correct  operation.  Also  note  that  the  status  of  P}  may  change  immediately  after  P,  checks  it.  The 
algorithm  operates  correctly  despite  this  inconsistency. 

6.3  The  Liveness  Property 

The  liveness  property  guarantees  that  no  deadlock  or  livelock  situations  can  arise  within  the  alterna¬ 
tive  algorithm.  Such  situations  can  only  be  caused  by  an  erroneous  application  program.  Lemmas  (■ 
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Lemma  6  A  process  Px  will  never  return  to  the  Running  state  after  entering  a  transaction  unless 
a  rendezvous  occurred. 


Proof:  By  inspection  of  the  alternative  algorithm,  the  process  only  returns  to  the 
RUNNING  state  when  either:  (a)  P,(Tr)  signals  Pj(T,)  or  (b)  after  P,(Tr)  receives  a 
signal  from  P}(Tt).  In  either  case,  P,(Tr)  rendezvoused  with  P:(Ta).  I 


Lemma  7  A  process  P,  cannot  remain  blocked  on  a  Lock  operation  in  the  alternative  algorithm 
for  an  unbounded  amount  of  time. 


Proof:  The  only  Lock  operation  performed  by  the  algorithm  is  to  serialize  accesses  to 
AltList.  However,  once  any  process  obtains  a  lock  on  any  AltList,  it  must  eventually 
release  that  lock  because  no  unbounded  loop  or  blocking  primitive  is  executed  before 
the  corresponding  Unlock  is  performed.  Therefore,  the  lock  cannot  remain  in  place  for 
an  unbounded  amount  of  time.  No  process  will  remain  blocked  attempting  to  obtain  a 
lock  for  an  unbounded  amount  of  time  because  every  lock  will  eventually  be  unlocked, 
and  the  the  Lock  primitive  is  assumed  to  be  fair.  I 


Lemma  8  Suppose  P,  €  GuardList/T,)  and  P:  €  GuardListx(Tr ).  and  their  respective  I/O 
guards  are  compatible.  P,  and  P}  cannot  both  enter  the  WAITING  state  during  transactions  7, 
and  T,,  respectively. 


Proof:  Proof  by  contradiction.  Suppcee  both  P,  and  P:  enter  the  Waiting  state  on 
Tr  and  T,,  respectively.  Because  P,  reached  the  Waiting  state,  it  must  be  the  case  that 
the  last  time  P,  scanned  the  state  of  P3  before  P,  entered  the  Waiting  stare.  State:  was 
either  (1)  Running,  (2)  Sleeping,  or  (3)  Waiting  but  P,  failed  to  commit  to  P}  (If  P, 
successfully  committed,  they  would  have  rendezvoused  and  completed  the  transaction 
according  to  theorem  2.)  Consider  the  third  case.  We  will  now  show  that  P3  must 
have  been  in  a  transaction  preceding  T,  for  this  case  to  apply.  WakeUp/T,)  is  set  to 
0  before  State }  is  set  to  Waiting.  Therefore,  if  P,  saw  P:  in  the  Waiting  state  while 
Pj  was  in  transaction  T,,  and  P,  failed  when  it  tried  to  commit,  then  it  must  be  that 
some  third  process  must  have  committed  to  P}  during  T,  (after  WakeUp/T,)  is  set  to 
0  but  before  P,  attempted  to  commit).  But  this  successful  commit  must  have  resulted 
in  a  rendezvous,  contradicting  our  original  assumption  that  P3  blocked  indefinitely  in 
the  Waiting  state  while  in  T,.  Therefore,  if  case  (3)  applies,  P3  must  have  been  in  a 
transaction  previous  to  T,  when  P,  observed  it  to  be  in  the  Waiting  state. 
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Similarly,  Pj  also  reached  the  Waiting  state,  so  Pi  must  have  been  in  the  Running, 
SLEEPING,  or  Waiting  state  for  a  previous  transaction  the  last  time  P3  scanned  P, 
before  Pj  entered  the  Waiting  state.  Pi  and  Pj  could  not  have  both  scanned  each 
other  at  the  same  instant  because  each  would  have  found  each  other  in  the  Alt  state. 
Therefore,  one  scanned  the  other  first.  Without  loss  of  generality,  let  us  assume  Pi 
scanned  Pj  first.  Pi(Tr)  was  in  the  Alt  state  when  it  scanned  Pj,  and  because  it  did 
not  rendezvous  or  abort  (the  latter  would  require  Pj  to  be  scanned  again,  making  this  not 
the  last  time  P,  scanned  Pj),  Pi  must  have  remained  in  the  Alt  state  until  it  changed  to 
the  Waiting  state  and  blocked  indefinitely.  Therefore,  when  Pj  later  scanned  P,  for  the 
last  time,  Pj  must  have  seen  P,  in  either  the  Alt  or  the  Waiting  state  for  transaction 
Tr.  However,  this  contradicts  the  fact  that  Pj  saw  P,  in  the  Running,  Sleeping,  or 
Waiting  state  for  a  previous  transaction.  Therefore,  the  original  hypothesis  that  P, 
and  Pj  both  entered  the  Waiting  state  must  be  false.  I 

Lemma  9  A  process  Pi  cannot  remain  continuously  in  the  Alt  state  during  a  single  transaction 

Tr  for  an  unbounded  amount  of  time. 

Proof:  A  process  remains  in  the  Alt  state  while  it  is  scanning  the  processes  in  its 
GuardList  trying  to  find  one  which  is  ready  to  rendezvous.  If  none  is  found,  the  process 
proceeds  to  the  Waiting  state.  Because  GuardList  is  necessarily  bounded  in  length, 
we  must  show  that  a  process  does  not  6pend  an  unlimited  amount  of  time  scanning  a 
particular  guard. 

P,  moves  on  to  the  next  GuardList  entry  or  eventually  changes  state  when  it  finds  the 
process  corresponding  to  the  current  guard  is  in  either  the  SLEEPING,  Running,  or 
Waiting  state.  Therefore,  we  only  need  to  consider  scanning  a  process  Pj  which  is  also 
in  the  Alt  state.  If  TransIDj  <  TransIDi ,  then  P,  aborts  TryAltemative  and  changes 
to  the  Sleeping  state.  Thus  we  need  only  examine  the  case  TransIDi  <  TransIDj 
(both  cannot  have  the  same  ID).  In  this  case,  Pi  enters  a  loop  waiting  for  State}  to 
change.  In  order  for  P,  to  remain  in  this  loop  an  unbounded  amount  of  time,  P,  must 
continually  sample  Pj  while  Statej  is  Alt.  There  are  three  ways  P,’s  samples  can 
indicate  Pj  remains  in  the  Alt  state  for  an  unbounded  amount  of  time:  (1)  P )  is 
also  locked  into  the  Alt  state  for  an  unbounded  amount  of  time;  (2)  P;  repeatedly 
aborts  TryAltemative,  changes  to  the  Sleeping  state,  and  then  retries  TryAltemative 
(changing  back  to  the  Alt  state)  in  perfect  synchrony  with  P,’s  samples  of  Statej : 
or  (3)  Pj  repeatedly  rendezvous,  changes  to  the  Running  state,  and  then  initiates  a 
new  alternative  operation  in  perfect  synchrony  with  P,’s  samples  of  State}.  These  are 


23 


exhaustive  because  a  process  can  only  return  from  TryAltemative  after  a  rendezvous  or 
after  an  aborted  attempt.  Case  (2)  cannot  occur,  however,  because  the  sleep  period  is 
set  to  a  time  sufficiently  large  that  successive  samples  by  P,  will  detect  that  P3  is  in 
the  SLEEPING  state.  Similarly,  case  (3)  cannot  occur  because  the  minimum  execution 
time  of  the  Send  and  Recv  primitives  are  assumed  to  be  larger  than  the  time  between 
successive  samples  of  the  polling  loop.  Therefore,  only  case  (1)  remains. 

The  previous  discussion  shows  that  Pi  can  only  remain  in  the  Alt  state  scanning  P3  an 
unbounded  amount  of  time  if  the  following  conditions  hold:  (\)  TransID,  <  TransID 3, 
and  (2)  P3  remains  continuously  in  the  ALT  state  on  the  same  transaction  an  unbounded 
amount  of  time.  By  the  same  argument  presented  above,  P3  will  only  remain  in  the  Alt 
6tate  on  a  single  transaction  an  unbounded  amount  of  time  if  some  other  process  P *  is  in 
Pj' s  GuardList ,  TransID 3  <  TransIDk ,  and  Pk  remains  continuously  in  the  Alt  state 
an  unbounded  amount  of  time.  Continuing  this  logic,  because  the  number  of  processes 
is  bounded,  the  original  process  P,  will  only  remain  in  the  Alt  state  for  an  unbounded 
time  if  a  cycle  of  processes  exists  such  that  each  is  waiting  for  the  next  process  in 
the  cycle  to  leave  the  Alt  state.  This  would  require  that  TransID,  <  TransID 3  < 
TransIDk  <  •••  <  TransID i,  which  is  clearly  not  possible.  Therefore,  no  such  cycle 
can  exist,  so  p  cannot  remain  continually  in  the  ALT  state  for  an  unbounded  amount 
of  time.  I 

Lemma  10  The  TryAltemative  procedure  cannot  return  Failed  on  unbounded  number  of 

during  a  single  transaction  Tr  in  some  process  P,. 

Proof:  TryAltemative  returns  Failed  if  and  only  if  P,  scans  another  process  P3  and 
finds  P3  is  also  in  the  Alt  state,  and  TransID 3  <  TransID,.  The  number  of  guard*- 
in  GuardList  is  finite,  so  if  TryAltemative  fails  an  unbounded  number  of  times,  it  must 
be  that  for  some  process  P3,  the  conditions  State,  =  Alt  and  TransID 3  <  TransID, 
persist  for  an  unbounded  amount  of  time. 

P3  cannot  remain  continually  in  the  Alt  state  for  an  unbounded  amount  of  time  in  a 
single  transaction  (lemma  9).  Therefore,  it  must  be  the  case  that  either  ( 1 )  P,  finds 
P3  in  the  Alt  state  for  a  different  transaction  an  unbounded  number  of  times:  or  (2* 
within  a  single  transaction,  P3  repeatedly  switches  back  and  forth  between  the  Alt  and 
SLEEPING  states  for  an  unbounded  number  of  times,  and  it  so  happens  that  every  time 
P,  retries  TryAltemative  and  scans  P3,  P,  finds  that  P,  is  in  the  Alt  state  In  case  i  2 
TryAltemative  must  fail  an  unbounded  number  of  times  in  P3  as  well  as  P, 


Case  (1):  This  is  not  possible  because  each  new  transaction  ID  is  larger  than  all  previous 
IDs.  If  P,  finds  P3  in  the  Alt  state  for  a  new  transaction  an  unbounded  number  of 
times,  this  would  imply  there  are  an  unbounded  number  of  transaction  ID6  less  than 
TransIDi.  This  cannot  be  the  case  because  transaction  IDs  are  positive  integers. 

Case  (2):  An  argument  similar  to  that  used  in  lemma  9  can  be  used  here.  Summarizing 
the  arguments  presented  thus  far  in  this  lemma,  TryAltemative  in  P,  will  only  fail  an 
unbounded  number  of  times  if  it  also  fails  an  unbounded  number  of  times  in  some  other 
process  P3,  where  Trans ID 3  <  TransIDi .  Similarly,  P3  will  only  continue  to  fail  if 
some  other  process  P*  exists  which  also  continues  to  fail,  and  TransIDk  <  Tran$ID3. 
Because  the  number  of  processes  is  bounded,  a  cycle  of  processes  must  exist  6uch  that 
TransID ,  >  TransIDj  >  TransIDk  >  •••  >  TransIDi ,  which  of  course,  cannot 
occur.  Therefore,  a  process  cannot  fail  the  TryAltemative  procedure  an  unbounded 
number  of  times.  I 

Lemma  11  For  each  alternative  operation  initiated  by  P,,  P,  eventually  either  rendezvous  with 
some  other  process  P3  and  returns  to  the  Running  state,  or  moves  to  the  Waiting  state.  In  other 
words,  a  process  cannot  remain  in  the  ALT  state  in  the  same  transaction  for  an  unbounded  amount 
of  time. 

Proof:  The  only  way  a  process  can  not  reach  the  Waiting  state  or  rendezvous  is 
to  remain  continually  in  the  Alt  state,  or  switch  back  and  forth  between  Alt  and 
SLEEPING  an  unbounded  number  of  times.  The  latter  case  implies  TryAltemative  fails 
an  unbounded  number  of  times  within  a  single  transaction.  Neither  is  possible  according 
to  lemmas  9  and  10.  I 

Theorem  3  (Liveness)  Suppose  two  processes  P,  and  P3  each  initiate  an  alternative  operation 
and  P3  €  GuardListi(Tr)  and  P,  €  GuardList3(Tt)  and  their  communication  requests  are  compat¬ 
ible.  If  neither  Pi  nor  P3  rendezvous  with  another  process  during  their  respective  transactions,  P, 
and  P3  will  eventually  rendezvous  with  each  other  during  Tr  and  Tt,  respectively. 

Proof:  According  to  lemma  11,  P,  and  P3  must  each  eventually  either  rendezvous 
or  enter  the  Waiting  state.  They  both  cannot  enter  the  Waiting  state  according  to 
lemma  8.  Therefore,  at  least  one  of  the  two  processes,  say  P ,,  must  rendezvous.  By 
assumption,  P,  cannot  rendezvous  with  any  process  other  than  P3,  so  P,  must  rendezvous 
with  P}.  By  theorem  2,  P3  must  also  rendezvous  with  Pt.  Therefore,  P,  and  P3  must 
eventually  rendezvous  with  each  other.  I 
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7  Fairness 


One  issue  regarding  the  alternative  construct  that  has  received  considerable  attention  is  fairness. 
In  particular,  two  types  of  fairness,  weak  and  strong  fairness,  have  been  defined  [7,24].  We  call  an 
implementation  of  the  alternative  construct  weakly  fair  if  it  can  be  guaranteed  that  during  the  in¬ 
finitely  repetitive  execution  of  an  alternative  command,  a  guard  that  remains  continuously  available 
(i.e.,  enabled  and  the  neighboring  process  is  ready  to  communicate)  will  eventually  rendezvous.  An 
implementation  is  said  to  be  6trongly  fair  if  the  implementation  guarantees  that  any  guard  which 
is  available  infinitely  often  (though  not  necessarily  continuously  as  is  the  case  in  weak  fairness)  will 
eventually  rendezvous. 

The  algorithm  shown  in  figures  2,  3,  and  4  is  not  fair  in  either  the  weak  or  strong  sense. 
However,  weak  fairness  can  be  achieved  by  modifying  the  algorithm  so  that  the  order  in  which  the 
TryAltemative  procedure  scans  guards,  which  implies  a  certain  prioritization  of  the  guards,  varies 
from  one  call  to  the  next  so  that  each  guard  is  eventually  scanned  first.  More  precisely,  we  modify 
the  algorithm  as  follows: 

•  The  Alternative  and  TryAltemative  procedures  each  receive  all  guards  specified  in  the  alter¬ 
native  command  as  parameters.  The  original  procedures  assumed  only  enabled  guards  are 
passed. 

•  A  boolean  flag  is  associated  with  each  guard  indicating  whether  or  not  it  is  enabled. 

•  Define  a  distinct  integer  variable  for  each  alternative  construct  in  a  given  CSP  program.  These 
variables  could  be  defined  by  the  compiler.  Associate  with  the  mth  alternative  construct  in 
process  P,  the  variable  Althm.  Initially  set  to  0,  this  variable  is  incremented  each  time  this 
particular  alternative  construct  is  executed.  It  therefore  indicates  the  number  of  times  P,  has 
invoked  the  corresponding  alternative  construct. 

•  The  FOR  loop  in  the  TryAltemative  procedure  is  modified  so  that  it  begins  scanning  guard 
(AltiiTn  mod  n)  +  1  rather  than  the  first  guard,  where  n  is  the  number  of  guards  in  the 
alternative  construct.  The  FOR  loop  is  also  modified  to  skip  disabled  guards.  It  executes 
up  to  n  iterations  as  before.  The  index  variable  of  the  FOR  loop  “wraps  around”  to  1  after 
scanning  the  nth  guard. 

The  modified  algorithm  is  referred  to  as  the  Fair  Algorithm,  and  is  assumed  in  the  discussion 
which  follows. 


Theorem  4  (Fairness)  Let  Pi  be  blocked  on  an  alternative  operation  (i.e.,  P,  is  in  the  Waiting 
state)  in  which  some  process  Pj  is  listed  in  some  enabled  guard.  Further,  let  us  assume  P,  does  not 
become  unblocked  through  a  rendezvous  with  any  process  other  than  Pj.  Consider  an  alternative 
construct  A  in  Pj  that  has  been  executed  m  times  and  contains  n  guards,  one  of  which  (gv)  contains 
a  compatible  communication  with  Pi.  If  Pj  now  executes  A  at  least  n  more  times  and  gv  is  enabled 
on  each  of  these  n  invocations  of  A,  then  Pi  and  P:  will  rendezvous  before  the  (m  +  n)th  execution 
of  A  completes. 

Proof:  The  theorem  can  be  proved  by  contradiction.  Assume  P,  does  not  rendezvous 
with  Pj  before  the  (m  +  n)th  execution  of  A.  For  this  to  happen,  P3  must  continually 
be  rendezvousing  with  some  other  process(es)  before  it  scans  P„  because  the  moment  it 
scans  Pi,  it  will  see  that  Pi  is  in  the  Waiting  state  and  rendezvous  with  Pt.  However, 
the  Fair  Algorithm  guarantees  that  within  n  executions  of  A,  gv  will  become  the  first 
guard  that  is  scanned.  When  gv  is  scanned  first,  no  other  process  can  rendezvous  with 
Pj  before  Pj  scans  P„  so  a  rendezvous  between  P,  and  Pj  must  take  place.  I 

The  following  corollary  follows  immediately  from  this  theorem: 

Corollary  1  In  an  infinitely  repetitive  execution  of  an  alternative  construct,  a  guard  cannot  remain 
continually  available  for  an  unbounded  amount  of  time  without  eventually  rendezvousing. 

This  shows  that  the  Fair  Algorithm  is  weakly  fair.  It  demonstrates,  for  instance,  that  a  process 
waiting  to  be  served  by  another  process  cannot  be  continuously  denied  service  for  an  unbounded 
amount  of  time.  The  Fair  Algorithm  is  not  strongly  fair,  however.  Modification  of  this  algorithm 
to  one  which  is  strongly  fair  is  an  open  question.  None  of  the  alternative  algorithms  that  have  been 
developed  thus  far  (based  on  message-passing  architectures)  is  strongly  fair. 

8  Conclusions 

We  have  presented  an  algorithm  that  implements  the  generalized  alternative  construct  in  CSP. 
Unlike  previous  algorithms,  this  is  based  on  a  shared  memory  architecture.  It  has  been  shown  that 
the  algorithm  maintains  the  safety  and  liveness  properties  required  by  any  correct  implementation. 
Extensions  to  the  algorithm  that  allow  processes  to  terminate  and  guarantee  weak  fairness  were  also 
presented.  An  implementation,  written  in  C,  has  been  developed  for  a  16-processor  BBN  Butterfly 
parallel  processor.  Empirical  performance  evaluation  of  this  implementation  is  in  progress. 
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Figure  1:  The  State  diagram  of  a  process. 
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/*  r  is  the  rsmots  procsss  */ 

PROCEDURE  CheckAndCommitUltList^ff,):  INTEGER; 
VAR 

INTEGER  GuardNumbsr ;  /*  number  of  matching  guard  */ 
BEGIN 

Lock(AltLockr) ; 

/*  chock  guard  matchss  and  is  compatible  */ 

GuardNumbsr  :■  CheckGuard(AltListr ,  gj.); 

IF  (GuardNumbsr  ■  FAILED)  THEN 
Unlock(AltLockr)  ; 

RETURN  (FAILED); 

/*  try  to  commit  */ 

ELSEIF  (WakoUpr  -  0)  THEN 
WaksUpr  •  WaksUpr  «■  1; 

Unlock(AltLockr) ; 

RETURN  (GuardNumbsr); 

ELSE 

Unlock(AltLockr) ; 

RETURN  (FAILED); 

END; 

END  ChsckAndCommit; 


Figure  2:  Procedure  to  check  that  a  potential  communication  is  valid  and,  if  so,  to  commit.  The 
CheckGuard  function  returns  the  number  of  a  matching  (and  compatible)  remote  guard  or  returns 
Failed  if  none  was  found. 
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/*  gi  are  enabled  I/O  guards  */ 

PROCEDURE  Alternative  (gi,  ....  gn) :  INTEGER; 

VAR 

INTEGER  RetumValue;  /*  indicates  guard  that  rendezvoused  */ 
BEGIN 

/*  1  is  the  local  process  id  */ 

TransIDi  :*  Atomic Add (Next Trans ID) ; 

ReturnValue  :*  FAILED; 

WHILE  (ReturnValue  -  FAILED)  DO 

ReturnValue  :«  Try  Alternative^ . gn)  ; 

IF  (ReturnValue  «  FAILED)  THEN  Sleep (TimeOut) ;  END; 
END; 

RETURN  (RetumValue) ; 

END  Alternative; 


Figure  3:  The  “front  end”  procedure.  TryAltemative  returns  the  number  of  the  guard  on  which  a 
rendezvous  took  place  or  Failed  if  it  aborted. 
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gn):  INTEGER; 


PROCEDURE  Try Alternative (gi . 

VAR 

BOOLEAN  flag; 

INTEGER  GuardNumber;  /•  corresponding  guard  of  Pr  */ 
INTEGER  i.  r; 

BEGIN 

Statei  :■  ALT; 

/*  look  for  rendezvous  with  a  waiting  process.  */ 

FOR  i:«l  TO  n  DO 

r  :»  CoamunicantlD(gi) ; 
flag  :•  TRUE; 

WHILE  (flag)  DO 

CASE  Stater  DO  /*  The  remote  process  state.  */ 
RUNNING:  flag  :«  FALSE; 

SLEEPING:  flag  :■  FALSE;  /*  try  next  guard  */ 
WAITING:  GuardNumber  :■  CheckAndCommit(AltListr,  gi) ; 
IF  (GuardNumber  -  FAILED)  THEN 

flag  :■  FALSE;  /*  try  next  guard  */ 

ELSE  /*  Wake  up  Pr  */ 

Statei  :■  RUNNING; 

Signal (r,  GuardNumber); 

Communicate (gi) ; 

RETURN  (i); 

END; 

ALT:  IF  (TransIDi  <  TransIDr)  THEN 

WHILE  (Stater  -  ALT)  DO  END; 

ELSE  /*  busy  wait  loop.  */ 

Statei  '•*  SLEEPING; 

RETURN  (FAILED);  /*  abort...*/ 

END;  /*  if-then-else  */ 

END;  /*  case  statement  */ 

END;  /*  while  loop  */ 

END;  /*  for  statement  */ 

/*  couldn't  find  guard  to  rendezvous  */ 

Lock(AltLocki);  AltListi  :»(gi . gn);  Unlock(AltLoeki); 

WakeUpi  :■  0;  /*  first  to  commit  gets  rendezvous  */ 

Statei  :■  WAITING; 
i  :*  WaitForSignal 0 ;  /•  Blocks  •/ 

Statei  :•  RUNNING; 

Comsnmicate(gi) 

RETURN  (i); 

END  Try  Alternative; 


Figure  4:  The  TryAltemative  procedure  attempts  to  rendezvous  with  a  process  listed  in  an  I/O 
guard,  and  does  not  return  until  rendezvous  takes  place. 
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